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ABSTRACT 



A computer implemented method analyzes an execution of 
a program. The method partitions the program into program 
components such as basic blocks and procedures. A source 
or executable representation of the program is instrumented 
to collect test coverage data. In addition, a flow graph 
representing the program components is generated. The 
program is then executed to collect test coverage data. Using 
the test coverage data and the flow graph, the program is 
partitioned into executed and unexecuted components. The 
number of instructions in each unexecuted program com- 
ponent is counted. Thus, a list of the unexecuted program 
components can be presented according to a decreasing 
order of the number of unexecuted instructions in the 
unexecuted program components. 

15 Claims, 9 Drawing Sheets 
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METHOD FOR ANALYZING AND was executed during the test. The code "— " indicates that 

PRESENTING TEST EXECUTION FLOWS the line was not executed . The uncoded lines do not have any 

OF PROGRAMS equivalent expression in the machine executable image. 

In this example listing, it can be seen that during the test, 

FIELD OF THE INVENTION 5 0 n line 126 always failed because line 127 was never 

This invention relates generally to computer systems, and executed. The test for on line 128 always succeeded 

more particularly to analyzing execution flows during the because line 129 was executed, but line 131 was not. 

testing of programs. This manner for presenting test coverage data is relatively 

easy to understand for simple small programs. However, real 

BACKGROUND OF THE INVENTION 10 pr0 g ram s may include thousands of lines of code distributed 

An important step in producing a high quality software over P<*sfoly hundreds of source code files. Program devel- 

program is to lest the program before it is generally released. °P ers rarel y read source files serially from start to end, but 

A poorly tested program increases production costs and this is what » demanded by this type of presentation, 

time, and decreases user satisfaction. As programs become More importantly, this type of presentation does not 

more complex, it becomes more difficult to ensure that a readily distinguish what is important, from what can be 

program has been adequately tested. An adequate test exer- ignored. In practice, test coverage of an entire program can 

cises as much functionality of the program as possible with never be complete. For various reasons, such as program 

input data that are representative of real data processing complexity, and many possible input states, only about 80% 

problems. The testing should reveal if the program has any to 90% of coverage is typically attainable. The remaining 

flaws in its design. unexercised code can be scattered over tens or hundreds of 

One way of verifying program testing is called "test saum files - A seiial P erusal of the enlire P resen " 

coverage analysis." Test coverage analysis measures the tatl0n t0 uncover potential design flaws can consume extra 

scope and quality of a test by determining how much of the time > and int roduce errors. 

program was exercised during the test. Typically, test cov- 25 A more useful presentation of test coverage data would 

erage analysis requires three major steps. First, the program take into consideration the logical structure of the program, 

to be tested is modified so that its execution can be moni- and quickly suggest how to change the test to cause more of 

tored. The modifications generally entail adding instructions the code to be executed. Moreover, the test coverage infor- 

to the program. The "monitoring" instructions will divert the mation would be presented to a user in a form which is not 
execution flow to monitoring procedures which can record 30 necessarily serially organized according to the order of the 

the fact that a portion of the monitored program has been instructions. In contrast to the linear listing above, a useful 

executed. presentation would focus on the significant unexecuted 

Normally, code modification, or "instrumentation" as this P ortions of the P ro 8P un 80 ^8 could «* improved, 

is sometimes known, is done in two ways. Either the source SUMMARY OF THE INVENTION 

code is modified and the modified program is compiled as 35 _ . . , , , , , 

, . , ! . c j j The invention provides a computer implemented method 

usual, or an executable image of the program is modified. „ . . r • r ^ 

, . r t ° for monitoring an execution of a program. The program is 

In either case, during a second step the modified code is mstrumented by inserting can instructions into the program, 

baded for execution into a memory of a computer system. ^ caU mstructions are t0 kt , an execution of the 

During execution of the modified program, the execution ^ caU mstruct i ons can direct the execution t0 

flow is intercepted at the mstrumented instructions The 40 momtoring routm es for collecting execution data, 

monitoring procedures can then gather and record test . ..... ... , . „ , . 

5 . u - u • -c . f.u c .u In addition, a partial or complete flow graph is generated 

coverage data which are significant of the operation of the , ,. t*. a t_ . i • • 

0 or f or jjj 6 program, jn e fl ow graph represents the logical 

J° . ' . , , , , structure of the program as nodes, and possible paths 

Durmg a final step, the test coverage data are analyzed to & h ^ Q[ macnin6 code M directed ^ 

determine the behavior of the program in response to a connecting the nodes. The nodes can be components of the 
particular set of input data. During this step, it can be ffl> for ^ fc of individual machine execut . 

determined if all portions of the program were adequately ab l e instructions called basic blocks. A basic block is a group 
tested If this is not the case, then the input test data can be of executed havin one ent rance and 

modified, and the testing cycle can be repeated until a ex j t pomt 

significant portion of the program is tested. 50 ™^ ,.' , . . • . . , , 

° r The directed arcs which link the nodes begin at an exit 

Conventionally, test coverage data have been presented as ^ ^ end a( an entry point of a destmation block . Exit 
an annotated listing of the source code. Table 1 shows such points which m flow mstructioas> for example 

an example listing. conditional branch and jump instructions, are called decision 

55 points. A basic block or procedure ending in a decision point 

TABLE 1 has more mail one ^ leaving the block, and therefore, 

multiple possible destinations. 

For example, in a block ending with a conditional branch 
instruction, one arc ends at the block which is the destination 
60 if the branch is taken, and the other arc ends in the next 
sequential block which is reached if the branch is not taken. 

Basic blocks corresponding to each source code proce- 
dure are further identified and grouped, and entry and exit 
points of the procedures are identified. In the graph, proce- 
65 dures are linked by connecting a block containing a call 
In the table, each source code line is preceded by a line instruction with an arc ending at a destination procedure 
number and a code. The code "++" indicates that the line reached by executing the call instruction. 
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Some indirect call instructions may call different proce- according to the preferred embodiment of the invention. A 

dures for different executions of the program. For these, no program can be conventionally prepared as source code 

links or arcs are made. Instead, each procedure which is a modules or files 110. A compiler 120 generates correspond- 

possible destination reached by an indirect call instruction is m g object code modules 130. A linker 140 combines the 

marked as an indirectly called procedure. 5 object code modules 130, perhaps with library routines 131, 

After the construction of arcs to called procedures, and the into an executable image 150. 

marking of indirectly called procedures, some remaining A monitor 160 ma kes modifications to the image 150 for 

procedures may neither have arcs from direct call instruc- the of monitoring the execution of the program. For 

tions nor be indirect^ called; these are unreachable proce- { ^ monitor m duces a modified execuUble 

dures which cannot be reached by any possible execution or in • .* n • *u * • 

the ro ram * * r io program 170 by inserting calls 173 m the executable image 

e program. 150. The calls will intercept and divert the execution flow to 

The preferred embodiment of the invention uses the flow „_j 11ppc m tl c _ . 

, f 4 . u . t , , momtonng procedures 172. Ine procedures 172 are to 

graph for representing the entire machine executable pro- , . E « 4 * 4 j . . -i iL 

gram. Alternative embodiments may use flow graphs con- dynamically collec test coverage data while the program 

structed directly from the source code, or from some inter- c 170 ^ executing. Alternatively, the mtercepting instructions 

mediate representation of the program, and may construct a 35 can directl y be mserted mi0 the code 110 if available, 

flow graph for a portion of the program. The procedures 172 can be linked to the modified pro- 

During execution of the program, the monitoring routines g ram 171 by the monitor 160. Alternatively, a loader 180 can 

collect test coverage data. The test coverage data are then dynamically link and load the procedures 172 at execution 

used to partition the flow graph into executed and unex- time from a library (dll) 174. The monitor can also construct 

ecuted program components. Certain of the unexecuted 20 a flow graph 300 of the program. The flow graph can be 

components comprise unexecuted destination points. An complete or partial. A partial graph only represents the 

unexecuted destination point is identified as being an entry portions of the program to be analyzed, a complete flow 

point of an unexecuted component where the test execution graph covers the entire program. 

could have, but did not execute a portion of the program. ^ program is loaded mt0 a computer system 190. The 

In one aspect of the invention, an unexecuted destination 25 system 19Q caQ delude a processor (P) 191, memories (M) 

point is identified as being an entry point of an unexecuted 192> ^ mput/output interfaces (I/O) 192 connected by a 

program component that is the destination of an executed bus m N the ^ ter 190 or similar colter 

control flow instruction where one or the tested conditions ? „ r a. u -r 

, ¥ 4 - . . can perform any of the process steps such as compiling, 

was never encountered. In another aspect of the invention, ,• • • * i j. i • 1 

an unexecuted destination point is identified as being the 30 hmng ' ^trumenting loadmg, executing, analyzing, and 

entry point of an unexecuted procedure which may be Panting as detailed herein. 

indirectly called. In another aspect of the invention, an During execution of the program 170, test coverage data 

unexecuted destination point is an entry point of an unreach- 410 can be collected by the procedures 172. The test data can 

able procedure. be stored in the memories 193. Subsequently, the data can be 

For each unexecuted destination point, the amount of 3S analyzed in conjunction with an enhanced (colored) flow 

unexecuted source code or machine executable code which graph 500 to indicate how the portions of the program were 

could have been reached by executing the component is executed as a test data coverage presentation 199 by an 

determined. The unexecuted destination points are then analyzer and presenter 198. 

presented in a decreasing order of this determination. The FIG. 2 shows the sub-steps of a process 200 for generating 

presentation indicates to the test designer the points in the 40 the flow graph 300. The flow graph 300 can, for example, be 

program that are opportunities to execute more of the statically constructed by the monitor 160. In step 210, the 

program. The presentation also naturally suggests changes program is partitioned into procedures defined in the source 

which can be made to the test to increase test coverage. code 110. Each procedure is partitioned into fundamental 

BRIEF DESCRIPTION OF THE DRAWINGS program components such as basic blocks. A basic block is 

4 5 defined as a sequence of machine executable instructions 

FIG. 1 is a flow diagram of an process according to the mat are ^ executed if any instruction of the block is 

invention; executed. A basic block has a single entry point and a single 

FIG. 2 is a flow diagram of a process for constructing a ex j t pomt) an d me execution flow in the basic block is 

flow graph of a program to be tested; predictably linear. 

FIG. 3 is a flow graph of the components of the program 5Q M shown m grea ter detail in FIG. 3, the program of Table 

to be tested; 1 above can be represented as basic blocks 312-316. Each 

FIG. 4a is a flow diagram of a process for analyzing and basic block is a rectangular node in the graph 300. The basic 

presenting test coverage data; blocks 312-316 are connected by edges (directed arcs) 318 

FIG. 4b is a block diagram of the program partitioned into in step 220 of FIG. 2. The edges 318 indicate the possible 

executed and unexecuted program components; 55 execution flows among the basic blocks 312-316. For 

FIG. 5 is a colored flow graph; example, the edges represent branch or jump (flow control) 

FIG. 6 is a flow diagram of a process for presenting test instructions at the ends of the basic blocks and procedures, 

coverage data; These flow control instructions direct the execution of the 

FIG. 7 is a presentation of test coverage data of a sample program to its different component parts during a particular 

program; and 60 execution. 

FIG. 8 is a presentation of test coverage data for a A collection of basic blocks can be grouped into a higher 

compression program. l eve * °f program components called procedures 310 and 

320, step 230 FIG. 2. Procedures are sequences of instruc- 

DETAILED DESCRIPTION OF PREFERRED tions mat have well defined entry ^ exit ^ ni _ m 

EMBODIMENTS 65 and 32 1 and 329. Typically a procedure is accessed by a 

FIG. 1 shows the general flow of data and processes 100 "call" instruction, and the execution of a procedure is 

for analyzing and presenting test coverage data of a program terminated by a "return" instruction. Once the components 
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and execution flow of the program have been determined, 
the flow graph 300 can be constructed in step 240. 

In FIG. 3, the sample program of Table 1 has been 
partitioned into procedure (p) 310 and procedure (q) 320. 
Procedure p 310 includes basic blocks 312-316. The pro- 
cedure begins at entry point 311, and completes at return 
319. The various edges 318 indicate the possible execution 
flows through the procedure p 310. Procedure q 320 has a 
body 322, the details of which are not shown for 
conciseness, and entry and exit points 321 and 329. 

FIG. 4a show the sub-step of a process for analyzing and 
presenting the test coverage data in greater detail. The basic 
test coverage data 410 are collected as the program is 
executing. For example, if the first instruction (entry point) 
of every basic block is instrumented, then the procedures 
172 of FIG. 1 can indicate the execution of a basic block 
when the execution flow is dynamically intercepted during 
testing. These data can be stored in the memories 193. Since 
unexecuted components are not intercepted, there will be no 
data collection for components. One goal of the invention is 
to minimize the number of blocks and procedures that have 
uncollected test coverage data. 

The test coverage data 410 and the flow graph 300 can be 
used to generate the colored flow graph 500 of FIG. 5 in step 
405. In FIG. 5, program components which have been 
executed are marked or colored (hashed in the Figure). Any 
unexecuted components are left unmarked or uncolored in 
the graph 500. 

In step 420, three lists 430 of unexecuted destination 
points are generated using the flow graph 300. A basic block 
list 431 has one entry (destination points BB) for every basic 
block which was not itself executed, but has a predecessor 
block which was executed. As shown in FIG. 5, there are two 
blocks 313 and 316 which qualify. These blocks are marked 
with asterisks (*). Each of these marked blocks is the 
successor of an "if block which was executed, but from 
which not all possible paths were taken. Therefore, each one 
of these unexecuted successor blocks defines an opportunity 
to execute more code. 

The second list 432 of FIG. 4a has entries (destination 
points IP) for unexecuted procedures which may be indi- 
rectly called. In most programs, the majority of procedure 
use direct calls, like from p to q in the example program of 
FIG. 5. With direct calls, the identity of the called procedure 
can statically be determined. 

However, some procedures may be called indirectly using 
a run-time variable. In those instances, the called procedure 
can only be determined dynamically, and the identity of the 
called procedure may depend an a particular execution of the 
program with a particular set of input data. 

In terms of a flow graph, there are some call sites that call 
unknown procedures, and there are indirectly called proce- 
dures whose call sites are not known prior to execution. If 
an indirectly called procedure is never called during pro- 
gram execution, there is no way of knowing from which 
unexecuted call site it might have been entered. By way of 
example, procedure q is never executed. However, because 
it is called only directly, and never indirectly, no entry IP is 
generated for list 432. 

The third list 433, has one entry (destination points UP) 
for every procedure which has no incoming edges in the call 
graph and which is not indirectly called. These procedures 
constitute "dead" code that is unreachable by any execution 
of the program. In some cases, dead code can account for as 
much as 30% of the program. For the purposes of test 
coverage analysis, dead code may represent code that has 
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become superfluous during program development by, for 
example, multiple authors, and can be removed. The lists 
430 can then be used to generate the presentation 199 in step 
440 of FIG. 4a. 

5 As shown in FIG. 4b, the colored flow graph 500 essen- 
tially allows the program 150 to be partitioned into executed 
and unexecuted portions 151-152. In FIG. 4b, procedure 
153 includes one or more branches to basic blocks 154 (BB 
entries 431) which were not executed. Procedures 156 are 

10 unexecuted indirectly called procedures which were not 
called from executed procedure 155 (IP entries 432), there- 
fore no links exist. Procedures 157 are never called, that is, 
there is no call edge from the executed portion 151 of the 
program 150 these procedures. 

The sub-steps for preparing the presentation 199 are 
shown in FIG. 6. The lists 430 of FIG. 4 indicate the 
unexecuted destination points of the program. The colored 
flow graph 500 has the property that every unexecuted 
instruction can be located by starting at the destination 
points in the lists 430 and only traversing edges to other 

20 unexecuted nodes. This is done in steps 610 and 620 of the 
process of FIG. 6. 

During the traversal of the unexecuted components, step 
630 counts all instructions which are reachable but not 
executed. The counts can be stored as a field in the entries 

25 of the lists 430. The traversals stop when an executed 
(colored) node is reached. For instance, the basic block q() 
316 of FIG. 5 would count all instructions of procedure q in 
this way. The block l=j«0 313 would only count itself. 
Once the reachable unexecuted instructions have been 

30 counted for the unexecuted destination points in the lists, the 
components including these points can be sorted in decreas- 
ing (fewer instructions) importance in step 640. The sorted 
list can be presented in step 650. 

35 FIG. 7 shows an example presentation 700 for the sample 
program. The presentation 700 correctly shows that in order 
to improve the test coverage, the input data should be 
modified to ensure that procedure q is called from procedure 
p. This is more important than causing the test I==»l to 
„ succeed. 

40 

FIG. 8 shows the first entry in a presentation 800 for the 
well known SPEC benchmark program "compress," a file 
compression/decompression utility program. In the SPEC 
suite of programs, compress is used to encode a single file. 

45 The methodology of the presentation as disclosed herein has 
correctly revealed for this particular execution that the 
procedure for file decompression was never called. Since 
this procedure accounts for the largest portion of unexecuted 
code, an improved test coverage is immediately suggested. 

50 The foregoing description has been directed to specific 
embodiments of this invention. It will be apparent, however, 
that variations and modifications may be made to the 
described embodiments, with the attainment of all or some 
of the advantages. Therefore, it is the object of the appended 

55 claims to cover all such variations and modifications as 
come within the spirit and scope of the invention. 
I claim: 

1. A computer implemented method for analyzing test 
coverage data of a program in a computer, comprising the 
60 sle P s of: 

generating a flowgraph representing program components 
of the program and possible execution flows through 
the program components; 

collecting test coverage data while executing the program; 
65 determining executed and unexecuted program compo- 
nents on the flowgraph using the collected test coverage 
data; and 
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presenting a list of unexecuted program components 
according to a decreasing order of a number of unex- 
ecuted instructions reachable from each unexecuted 
program component. 

2. The method of claim 1 wherein the step of presenting 
further comprises the steps of: 

selecting unexecuted destination points from among the 

unexecuted program components; and 
counting the number of unexecuted instructions reachable 

from each of the selected unexecuted destination 

points. 

3. The method of claim 2 further comprising: 
selecting first unexecuted destination points that are tar- 
gets of executed flow control instructions. 

4. The method of claim 2 further comprising: 
selecting second unexecuted destination points that are 

targets of indirect procedure calls. 

5. The method of claim 2 further comprising: 
selecting third unexecuted destination points that are 

unreachable by any program component. 

6. The method of claim 1 further comprising: 
instrumenting a source code representation of the pro- 
gram. 

7. The method of claim 1 further comprising: 
instrumenting a machine executable code representation 

the program. 

8. The method of claim 1 further comprising: 
generating a complete graph of the program. 

9. The method of claim 1 further comprising: 



10 
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generating a partial graph of the program. 

10. The method of claim 1 further comprising: 
identifying flow control instructions in the program which 

are decision points, the flow control instructions that 
can dynamically alter the execution flow to multiple 
destination points; 
adding calls to the program at each possible destination 
point, the calls to divert the execution flow to moni- 
toring routine for recording test coverage data in a 
memory of the computer. 

11. The method of claim 10 further comprising: 
linking the monitoring routine to the program when the 

program is loaded for execution. 

12. The method of claim 1 wherein the graph includes one 
node for each program component to be represented, and 
edges of the graph indicate possible execution flows 
between directly called program components. 

13. The method of claim 12 further comprising: 
beginning at a particular unexecuted destination point, 

traversing all edges of the graph linking unexecuted 
program components while counting until another 
executed program component is reached. 

14. The method of claim 12 further comprising: 
coloring nodes of the graph to indicate the unexecuted 

program components. 

15. The method of claim 1 wherein the program compo- 
nents are basic blocks and procedures. 
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